Development of a novel combined nomogram model integrating deep learning radiomics to diagnose IgA nephropathy clinically

Abstract This study aimed to develop and validate a combined nomogram model based on superb microvascular imaging (SMI)-based deep learning (DL), radiomics characteristics, and clinical factors for noninvasive differentiation between immunoglobulin A nephropathy (IgAN) and non-IgAN.We prospectively enrolled patients with chronic kidney disease who underwent renal biopsy from May 2022 to December 2022 and performed an ultrasound and SMI the day before renal biopsy. The selected patients were randomly divided into training and testing cohorts in a 7:3 ratio. We extracted DL and radiometric features from the two-dimensional ultrasound and SMI images. A combined nomograph model was developed by combining the predictive probability of DL with clinical factors using multivariate logistic regression analysis. The proposed model’s utility was evaluated using receiver operating characteristics, calibration, and decision curve analysis. In this study, 120 patients with primary glomerular disease were included, including 84 in the training and 36 in the test cohorts. In the testing cohort, the ROC of the radiomics model was 0.816 (95% CI:0.663–0.968), and the ROC of the DL model was 0.844 (95% CI:0.717–0.971). The nomogram model combined with independent clinical risk factors (IgA and hematuria) showed strong discrimination, with an ROC of 0.884 (95% CI:0.773–0.996) in the testing cohort. Decision curve analysis verified the clinical practicability of the combined nomogram. The combined nomogram model based on SMI can accurately and noninvasively distinguish IgAN from non-IgAN and help physicians make clearer patient treatment plans.


Introduction
Immunoglobulin A (IgA) nephropathy (IgAN) is the most common form of primary glomerulonephritis worldwide, accounting for more than 40% of all biopsies in China, and is the main cause of chronic kidney disease (CKD) and renal failure [1][2][3].The IgAN typically manifests as asymptomatic seizures, accompanied by proteinuria, hypoalbuminemia, edema, and hyperlipidemia.However, these symptoms are neither suitable for the diagnosis of primary glomerulopathy [4].Renal biopsy is the gold standard for diagnosing IgAN.There are many disadvantages to renal biopsy.For example, as an invasive examination, renal biopsy may cause multiple complications such as perirenal hematoma, pain, and infection [5][6][7].Due to the limitations of puncture technology, some patients fear and reject renal biopsy or develop puncture contraindications; these and other reasons result in the biopsy not being conducted normally and, subsequently, hinder patients from obtaining the best treatment.Therefore, there is an urgent need for a simple and noninvasive IgAN diagnosis model.Although some studies have attempted to make noninvasive IgAN and non-IgAN diagnoses preoperatively, diagnostic efficiency remains low [8][9][10][11] The well-known advantages of ultrasound imaging, such as low cost, noninvasiveness, and lack of ionizing radiation, make it an attractive choice for detecting kidney diseases [6].Ultrasound imaging mainly comprises two basic modes: B-mode ultrasound (BUS) and color Doppler flow imaging (CDFI).Recently, superb microvascular imaging (SMI) has been applied as a relatively new noninvasive flow imaging mode, which uses an adaptive wall filter different from CDFI and minimizes flash artifacts.Even without any contrast agent, SMI can also improve slow-flow visibility and sensitivity of small vessel signal detection [12,13].Compared with traditional color flow imaging, SMI is more sensitive in detecting low-velocity blood flow and displays more microvascular information [14][15][16].It also shows good consistency with contrast-enhanced ultrasound(CEUS) [16][17][18].To a large extent, ultrasound's diagnostic performance depends on radiologists' clinical and professional knowledge.Subjective image interpretation and lack of effective quantification are the main difficulties ultrasound diagnosis faces [19].
Artificial intelligence is widely used because of its excellent performance in image-recognition tasks.It can effectively improve the diagnostic accuracy of medical image interpretation and increase the objectivity of diagnosis [20,21].The conversion of medical images into digital high-throughput quantitative features in radiomics has received increased attention [22,23].Recently, we found that radiomics can effectively identify subclinical changes, which may be biomarkers, using ultrasound images [24,25].Deep learning (DL) can directly couple feature extraction, feature selection, and prediction model construction into a neural network model through end-to-end learning from medical images, thus greatly simplifying the process of radiomics analysis [26,27].The combination of DL classification networks and radiomics frameworks in integrated systems has become an emerging trend in achieving good performance in clinical tasks [28][29][30].Based on the above, we established a new combined nomogram model combining DL, radiomics, and clinical indicators based on SMI imaging.We tried to distinguish IgAN from non-IgAN without histopathological data noninvasively.

Patients
This prospective research conformed to the guidelines of the Declaration of Helsinki and was approved by the Internal Review Committee (PJ2022-11-29).Written informed consent was obtained from each study participant.
We continuously recruited 120 patients with primary glomerular disease confirmed by biopsy as research subjects between May 2022 and December 2022.We assigned them to two cohorts: the IgAN and the non-IgAN cohorts.Renal biopsy was performed within 3 days after renal ultrasound measurement by two experienced nephrologists.The right kidney was selected for renal biopsy.The inclusion criteria were as follows: 1) the diagnosis of primary glomerular disease confirmed by renal puncture biopsy of at least ten glomeruli in the specimen visible under a light microscope; 2) age >18 years; and 3) < 5 cm distance between the kidney and the skin.The exclusion criteria were as follows: 1) secondary glomerular disease; 2) diagnosis of infection, tumor, or autoimmune disease; 3) renal artery stenosis or urinary tract obstruction; and 4) presence of kidney cysts or tumor.The process for case selection is shown in Figure 1.Lesions in the development dataset were randomly assigned to a training cohort (70%) or a testing cohort (30%).
Single-factor logistic regression analysis was used to analyze the correlation between clinical parameters and IgAN.Then multi-factor logistic regression analysis was performed on the relevant factors (p < 0.05) obtained from the previous single-factor regression analysis to determine the independent predictive factors significantly related to IgAN.

Ultrasonic examination
An experienced ultrasound physician (with 16 years of experience in renal ultrasound) performed image acquisition on the day before the patient underwent renal puncture using a Canon ultrasonic system Aplio 700 (Canon Medical Systems, Otawara, Japan) with a 3.5 MHz linear probe (i8C1) during fasting for more than 6 h and apnea at the end of inspiration.In all the participants, the right kidney was examined by ultrasound, having undergone right kidney biopsies.With patients lying on the left side, the ultrasound probe was gently positioned in the right abdomen by oblique projection, and the 3.5 MHz probe was used for kidney ultrasound examination.The probe was placed on the posterior axillary line, the position and angle of the probe were adjusted to obtain a longitudinal image of the kidney, and the two-dimensional image was consecutive collected.
The ultrasound scanner was then switched to monochrome SMI mode.The SMI-specific area of interest frame was placed on the whole kidney.The mechanical index was 1.6, the frame rate was 25-35 frames/s, the dynamic range was 65-75 dB, and the SMI speed was 3.5 cm/s.After obtaining the optimal blood flow section of the kidney, the images were continuously collected.

Radiomics feature extraction
The kidney region of interest on ultrasound was manually segmented using ITK software (version 3.8.0,http://www.itksnap.org/pmwiki/pmwiki.php?n=Downloads.SNAP3).Then radiomics feature extraction was conducted using Pyradimics (version 2.2.0, http://www.radiomics.io/pyradiomics.htm),which could extract high-throughput features from ultrasonic images using various hard-coded feature algorithms.To assess the repeatability of radiomics features, the intraclass correlation efficiency (ICC) was calculated from ultrasound images of 50 randomly selected patients.Two sonographers experienced in abdominal ultrasound imaging, sonographer 1 (with 9 years of experience) and sonographer 2 (with 7 years of experience), performed the repeated ROI segmentation blinding to each other.Then the inter-group consistency of features was evaluated.After a month, sonographer 1 repeated the segmentation task to evaluate the intra-group consistency of features.The feature with an ICC value > 0.75 was considered to have high repeatability.Sonographer 1 performed the remaining image segmentation task alone.
The process of radiomics features selection consists of several steps.Firstly, features with an ICC > 0.75 in the training cohort were selected.Secondly, the Mann-Whitney U-test was utilized to identify features that exhibited significant differences between the IgAN and non-IgAN groups in the training cohort.Thirdly, the Spearman correlation coefficient was calculated to evaluate the relationship between each pair of features, and if the correlation coefficient was > 0.9, only the features with a higher AUC were retained.Finally, the LASSO algorithm was implemented to reduce the dimension of features, and the features with non-zero coefficients were selected.The entire pipeline integrated statistical testing methods and robust feature selection algorithms to identify the most significant features for subsequent model construction.

Deep learning feature extraction
Before input the images into the deep learning framework, a sonographer who was experienced in kidney ultrasound cropped all the images to contain the entire kidney mask area.The images with the largest ROI of the kidney were then cropped.The cropped kidney mask maintained the complete edge without exceeding the image boundary.Additionally, ultrasound images might be noisy and contain artifacts interfering deep learning process, so we use a traditional wavelet-based denoising algorithm to reduce noise and enhance image features.All the ultrasound images were then resized to a fixed size of 224 × 224 pixels, and the pixel values were normalized to a fixed range of [0, 1] through the z-score method, to improve consistency and reduce complexity.Before the deep learning model training, we performed image augmentation in all the selected images, including rotation (90°, 180°, and 270°), flipping (horizontal and vertical), and color jittering, to meet the need for data diversity in the real world.After image augmentation, additional training data were generated from original images and could improve the robustness of the deep learning model.It also prevented model overfitting and improved the generalization performance of the deep learning model.
For our task, we constructed a specific siamese neural network, which consisted of two identical neural networks that shared the same weights.Those two networks are fed with two datasets (US images and SMI images) for the same classification task.ResNet101 has been proven to be a feature extraction tool with good stability and performance [31].It has 101 layers, including convolutional, pooling, activation, and batch normalization layers.The ResNet-101 was used as the base model for feature extraction in our siamese neural network.To avoid model overfitting and learning the important features from input images, skip connections were used to learn residual mappings and allow gradients to flow more easily through the network.Transfer learning is used to reduce training time and improve model performance in new tasks by transferring parameters from similar tasks that have already been trained.Since our image dataset was relatively small, a transfer learning strategy was used in our siamese neural network.We collected the ImageNet dataset, which contains millions of images in thousands of classes and has achieved great performance in classification tasks, to pre-train the ResNet-101 model.Then a total of 1102 SMI images and 1105 US images were input as original images in the pretrained model for model training.
During the transfer learning process, the pre-trained weights of the ResNet-101 model are fine-tuned on altrasound images dataset for our calssification task.Considering the front layers had learned general features that were adapted to many tasks, we freezed the weights of the pre-trained front layers.We trained the last ten layers of the ResNet-101 network so that the later layers could learn more specific and complex features that were more task-dependent.The learning rate was set as 0.001 to control the step size during the gradient descent optimization process.To ensure a relatively faster convergence, the batch size was set as 64 during model training.We also used the regularization methods including dropout and weight decay, which help prevent overfitting of our model.A dropout rate of 0.1 and a weight decay of 0.0001 were used in the ResNet-101 model.The number of epochs in our study was set as 200, which determined the times the model iterated on the entire ultrasound images dataset.We collected the feature vectors in the average-pooling layer of the model as the deep learning features and used the ensemble learning method to calculate the average feature value for each patient.Finally, 2048 SMI-based and 2048 US-based deep learning features were obtained.The image dataset of the testing cohort was input into the trained model to extract deep learning features for further analysis as well.

Establishment of clinical nomogram model for DL radiomics
In our study, several feature selection methods were utilized to select optimal features, including the ICC method, U-test, LASSO (minimum absolute shrinkage and selection operator) with 10-fold cross-testing.In the training cohort, we performed feature selection for radiomics features and deep learning features, respectively.Two support vector machine (SVM) machine learning models were trained to construct radiomics model (Rad_model) and a deep learning model (DL_model).These two models' output scores were combined with clinical characteristics for further univariate and multivariate logistic regression analysis.A combined nomogram was then developed based on the significant clinical characteristics, Rad_model score, and DL_model score.The calibration curve was used to evaluate the calibration of the nomogram.The workflow of our study is shown in Figure 2.

Statistical analysis
All statistical analyses were performed using SPSS (version 25.0;IBM Corp., Armonk, NY, USA) and Python 2.7 (Python Software Foundation, Beaverton, OR, USA).Quantitative data with normal distribution are expressed as mean ± standard deviation, while quantitative data with non-normal distribution are expressed as median ± interquartile interval.At the same time, the classification variables are expressed in numbers and percentages.The chi-square test, two independent samples, Student's t-test, and Mann-Whitney U test were used for univariate analysis.Statistical significance was set at p < 0.05.The diagnostic performances of the Rad_ model, the DL_model, and the combining nomogram were evaluated based on the area under the curve (AUC).To evaluate the clinical effectiveness of the models, decision curve analysis (DCA) was performed by calculating the net benefit in different probability thresholds of the training and testing cohorts.

Results
Patient and baseline characteristics are shown in Table 1.This study recruited 120 patients with primary glomerular disease.A total of 3060 ultrasound images were obtained, including 1530 for SMI and 1530 for US.Among the patients, 54 cases had IgAN, and 66 cases had non-IgAN, which consisted of 30 cases of membranous nephropathy, 10 cases of minimal degenerative nephropathy, 5 cases of glomeruloid cell disease, 3 cases of focal segmental glomerulosclerosis, and 18 cases of mesangial proliferative glomerulonephritis.In the training cohort, a total of 2204 images were obtained, including 1102 images for SMI and 1202 images for US.38 of 84 patients (45.2%) had IgAN, and 46 of 84 patients (54.8%) had non-IgAN, as determined by pathology.In the testing cohort, a total of 856 images were obtained, including 428 images for SMI and 428 images for US.16 of 36 patients (44.4%) had IgAN, and 20 of 36 patients (55.6%) had non-IgAN.There was no group difference of baseline characteristics between the training cohort and the testing cohort.

Performance of radiomics model
A total of 3,384 ultrasonographic features were extracted from the ultrasound images.1728 features were found to be significantly different between the two groups, through interand intra-observer analyses and the Mann-Whitney U-test.These significant features were further selected using LASSO, which resulted in the most optimal features set including eight radiomics features (Figure 3).The radiomics model was then established based on these features.The predictive performance of the IgAN radiomics was assessed using the ROC curve, which showed that the AUC was 0.865 for the training cohort and 0.816 for the testing cohort (Figure 4).In the training cohort, the radiomics model achieved an accuracy of 79.76%, a sensitivity of 80.44% and a specificity of 78.95% in predicting IgAN.Similarly, in the testing cohort, the accuracy, sensitivity, and specificity of the radiomics model were 77.78%, 75.00%, and 80.00%, respectively (Table 2).

Performance of DL model
Using the siamese neural network, 4096 deep learning features were extracted from each patient's images, which included traditional US images and SMI images.Feature selection pipeline was performed using the same steps as radiomics analysis, and the most significant deep learning features were selected.Ultimately, seven deep features were selected to build the deep learning model (Figure 5).The DL_model was evaluated using the AUC, which showed that the AUCs were 0.994 and 0.844 in the training and testing cohorts, respectively (Figure 4).In the training cohort, the DL_model achieved an accuracy of 88.10%, a sensitivity of 84.78%, and a specificity of 92.11% in predicting IgAN.In the testing cohort, the accuracy, sensitivity, and specificity of the DL_model were 69.44%, 75.00%, and 65.00%, respectively (Table 2).

Performance of clinical combinatorial nomogram model for DL radiomics
We developed a combined nomogram to noninvasively differentiate IgAN and non-IgAN.The nomogram model combines five features, including DL_model score, Rad_model score, IgA, and hematuria.Each feature contributed to the nomogram output score based on a specific coefficient, and the total score of each patient was then calculated to determine the probability of IgAN.The AUC for the training and testing cohorts were 0.920 and 0.884 (Figure 4), respectively.In the training cohort, the model achieved an accuracy of 92.86%, a sensitivity of 93.48%, and a specificity of 92.11% in predicting IgAN.In the testing cohort, the accuracy, sensitivity, and specificity of the nomogram model were 80.56%, 75.00%, and 85.00%, respectively (Table 2).The calibration chart demonstrated a great consistency between the nomogram prediction probability and the actual results of IgAN (Figure 6).Additionally, the decision curve analysis revealed that the nomogram model yielded great net benefits in a risk threshold range of 0-1 in the training cohort, and a risk threshold range of 0-0.83 in testing cohort (Figure 7).

Discussion
IgAN is the most common primary glomerular disease worldwide and the main cause of renal failure, requiring renal replacement therapy [32,33].Early diagnosis is crucial for preventing IgAN from progressing to end-stage kidney disease.However, due to different reasons, renal biopsy is often not   performed, resulting in underestimated IgAN [34].Renal biopsy is the gold standard for diagnosing IgAN but is frequently not performed for different reasons [34].Therefore, there is an urgent need for a simple and noninvasive IgAN diagnostic model.In this prospective study, we used DL and radiomics to extract patients' SMI and two-dimensional ultrasound features.We combined them with clinical features to establish a nomogram model to distinguish IgAN from non-IgAN noninvasively.
The model showed good discrimination in the main (AUC = 0.981) and verification (AUC = 0.884) sets.This study is the first to use a combined nomogram model based on SMI to distinguish IgAN from non-IgAN noninvasively.Thrombotic microangiopathy is common in IgAN [33,35].We chose SMI to reflect the kidney's blood flow because SMI provides a lot of numerical and visual information [16,36].Furthermore, like CEUS microflow imaging, SMI displays the vascular architecture and blood flow grade [16,37].We extracted the image features of patients' radiology and performed DL using these features; the results showed that the extracted image features originate from both image types, confirming that two-dimensional ultrasound and the display of renal blood flow could be potential biological markers for distinguishing between IgAN and non-IgAN.DL radiomics automatically extracts high-throughput quantitative features from medical images and then comprehensively quantifies intrarenal heterogeneity, showing better performance than single radiomics and DL methods [30].In this study, we extracted eight radiologic features and seven DL features.These selected imaging features are not redundant but complementary.
After multivariate analysis of clinical laboratory indicators, IgA and urine occult blood were independent predictors for distinguishing IgAN from non-IgAN, consistent with previous studies [10,38].Furthermore, the combination model we developed combines DL and radiomics predictive probability with the conventional available clinical factors.Its results are  significantly better than those of the single DL and radiomics model; thus, it can be used as a translatable clinical tool that is easy to implement.For clinical use, a better result of the combination model could be obtained from decision curve analysis for most of the threshold probabilities, indicating that the combination model for treatment strategy will lead to better clinical results.Therefore, the clinical model of DL imaging based on SMI can effectively distinguish IgAN from non-IgAN, guide the treatment plan, and help implement personalized treatment.
Our research has some limitations: 1.The sample size was relatively limited, and the results may have some deviation.In future research, we will further expand the number of cases.2. All data were from one institution, and only data from one equipment supplier were used.3. The model has not been verified in an external independent cohort.
In conclusion, we propose a nomogram diagnosis model based on DL radiomics and clinical features displayed by SMI.This model extracts the features from SMI-based images through a convolutional neural network and fuses these features with the radiomics and clinical features to distinguish IgAN from non-IgAN.The results showed that the DL radiomics model based on SMI could effectively distinguish IgAN patients from non-IgAN patients.This model has good clinical applicability.The model provides an easy-to-use and personalized tool for noninvasive diagnosis of IgAN and can help physicians develop a more beneficial treatment plan for patients.

Figure 1 .
Figure 1.The patient Recruitment pathway and the inclusion and exclusion criteria.

Figure 2 .
Figure 2. The flow chart of the combined nomogram model integrating deep learning radiomics.

Figure 3 .
Figure 3. Spearman correlation coefficients were calculated for the eight selected features in Radiomics model, and their respective coefficient values.The longer x-axis variables of individual feature denoted that it played the more important role in the model.

Figure 4 .
Figure 4.The Receiver operating characteristic (ROC) curves of the five models.(a) Five Ml model ROC curves in the training cohort.(B) Three model Ml ROC curves in the testing cohort.

Figure 5 .
Figure 5. Spearman correlation coefficients were calculated for the 7 selected features in Dl model, and their respective coefficient values.The longer x-axis variables of individual feature denoted that it played the more important role in the model.

Figure 6 .
Figure 6.The combined nomogram model.(a) The values of Dl Scorec, rad score and clinical characteristics can be converted into quantitative values according to the points axis.after summing the individual points to achieve the final sum shown on the total points axis, the evaluation of this iga nephropathy is shown.(B) Calibration plots of nomogram for predicting igan.

Figure 7 .
Figure 7.The decision curve analysis for the five model.(a) the decision curve analysis in the training cohort.(B) the decision curve analysis in the testing cohort.

Table 2 .
Performance of the three models.